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Abstract 


A class of correlation based random variables networks is considered. A 
method to identify all cliques of a fixed size k with a given probability P* is 
proposed. The method is based on construction of two sets which we called 
upper set of cliques and low set of cliques. We prove that set of all cliques 
of size k in true threshold graph is included in upper set of cliques and 
this set includes all cliques from the low set of cliques with probability P* 
simultaneously. Several algorithms to simultaneously decrease upper set 
and increase low set of cliques without loss of significance are constructed 
and discussed. Example of stock market analysis using the method is 
given. 


Keywords — network, threshold graph, cliques, correlation, multiple hypotheses 
testing procedure, family-wise error rate, level of significance. 


1 Introduction 


Problem of detection of statistically significant communities in network attract growing 
attention last decades [7], [3], [19], [1], [17]. In this article we consider the problem of 
statistically significant detection of cliques in correlation networks. Clique in network 
represent a community with highest density of links between nodes. 

We consider a class of correlation based random variables networks [14]. Such 
networks appears in biological and medical studies [19], gene expession or gene co- 
expression analysis [6], in market network analysis [8], [9], [15], in climate network 
analysis [18] and others. Our main goal is to identify cliques in so called true threshold 
graph (({14], chapter 3). A particular case of threshold graph is a market graph popular 
in market network analysis [8]. We propose a method to identify all cliques of a fixed 
size k with a given probability P*. The method is based on construction of two 
sets which we call upper set of cliques and low set of cliques. We show that set of 
all cliques of size k in true threshold graph is included in upper set of cliques with 
given probability P* and this set includes all cliques from the low set of cliques with 
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probability P*. We propose an algorithm to simultaneously decrease upper set and 
increase low set of cliques without loss of significance. 

Our method is based on the multiple hypotheses testing theory where we use 
statistical procedures with Family Wise Error Rate (FWER) control [12], [2] and its 
connection with the theory of subset selection procedures [10], [11]. Important part of 
our method is related with directional statistical decision [13], [16], [20]. 

The article is organized as follows: in section 2 basic definitions and notations 
are given; in section 3 it is shown that for upper and low sets for edges construction 
one can use multiple hypotheses testing procedures with FWER control in strong 
sense; in section 4 upper and lower sets for true cliques of size k is constructed; in 
section 5 proposed method is formulated; in section 6 procedure for simultaneous upper 
and low sets construction for edges is proposed; in section 7 improved procedure for 
simultaneous upper and low sets construction for cliques of size greater than two is 
proposed; in section 8 an example of application of proposed approach to stock market 
analysis is presented and obtained results are discussed; in section 9 the conclusions 
are given. 





2 Basic definitions and notations 


Let X = (Xı,..., Xn) is a random vector with distribution f(z;0),0 € Q where Q 
is a parametric space, x € R. Let :,;(0) = y(X:,Xj;) is a measure of dependence 
between X;, Xj, i,j = 1,...,N. As measure of dependence one can consider Pearson 
correlation, Fechner correlation, Kendall correlation and etc. According to [14] a pair 
(X,7) will be called random variables network. 

The matrix 
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describe all pairwise dependencies between components of the vector X. The ran- 
dom variables network generates a network model. Network model for random vari- 
ables network (X,7y) is the complete weighted graph with N nodes (V,I), where 
V = {1,2,...,N} is the set of nodes, T = (%,;) is the matrix of weights, y:,; = 
(Xi, Xj) i Æ j, i,j = 1,...,N. This complete weighted graph we will call true 
network model. Threshold graph in the network model (V,I) is unweighted graph 
(V, E), where (i,j) € E if and only if yij > yo. This threshold graph will be 
called true threshold graph. Clique in the threshold graph (V, Æ) is the set of nodes 
Ialb, yo) = {i1,... ikh ij E Vij = 1,...,k such that Vi,7 E€ {i1,..., ik} one has 
7i,;(0) > yo. These cliques will be called true cliques. 

Note that there are two cases of an edge inclusion in threshold graph. First case 
which was adopted for example in [8] has the form: edge is added to the threshold 
graph if and only if %,; = y(Xi, Xj) > yo, where yo is a given threshold. Second 
case which was adopted in [14], [17] has the form: edge (i,j) € E if and only if 
i,j = Y(Xi, Xj) > yo, where yo is a given threshold. If 7;,; is measured by correlation 
then threshold graph could be called correlation graph [17]. 

For convenience we will distinguish three types of nodes pairs. Edge (i,j) € E if 
i,j = V(Xi, Xj) > yo, where yo is a given threshold. Such pairs of vertices will be 
denoted by PVE (pair of vertices with edge). For the case 7;,; < yo corresponding 
pair (i, j) will be called pair of vertices without edge and will be denoted PVN . Pairs 





(i, j) such that 7:,; = yo will be called pairs with uncertainty and will be denoted by 
PVU. These pairs could be interpreted as edges as well as pairs of vertices without 
edges. 

Let us introduce the following notations. 

J = 4{(i,j):i < j;i,j =1,...,N} is a set of indexes pairs. 





Je(O, yo) = { (i,j) : ¥a,3(8) > Yo, < j} is a set of PVE, 
In(9, 90) = {(4,5) : Yi (0) < Yo,4 < j} is a set of PVN, 
JO; ¥o) ={,9) 2 V9 (0) = 0st. <7 is a set-of PVU. 


It is obvious that sets Je(0, yo), Jn(9, yo), Jf (9, Yo) is the partition of the set J. 
Let x = (a; (t)),2 =1,...,N;t=1,...,n be a sample of the size n from distribution 
of the vector X. 


Definition 2.1. A set Jue(x, yo) C J satisfying 
Po (Jue(£, Y0) D Je(O,%0)). > P,e EQ (2) 
will be called upper set of level P* for Je(0,yo) and denoted USE(7yo, P* ). 
Definition 2.2. A set Jun(x,7o) C J satisfying 
Pi Jun feo) DIO.) > PN SO (3) 
will be called upper set of level P* for Jn(@,¥0) and denoted USN (o, P* ). 
Definition 2.3. A set Ji-(x,yo) C J satisfying 
Po(Jie(@, 0) C Je(8,¥0)) > P*,VOED (4) 
will be called low set of level P* for Je(0, yo) and denoted LSE(yo, P*). 
Definition 2.4. A set Jin(x,yo) C J satisfying 
Po(Jin(x, yo) C Jn(O,70)) > P,O EQ. (5) 
will be called low set of level P* for Jn(@, yo) and denoted LSN (o, P* ). 
Definition 2.5. Sets Jue(x, yo) C J, Jie(x, Yo) C J satisfying 
Po(Jue(x,¥o) D Je(0, y0) D elx, y0)) > P*, VVER 
will be called simultaneous upper and low sets of level P* for Je(0, 0). 
Definition 2.6. A set of all true cliques of size k for threshold yo will be denoted by 
Tee O70) = an Oo] = en a 60) | S k VeeS eraat 
where |Ia, (0, Yyo)| is equal to the number of nodes of Ia, (0,0). 
Definition 2.7. A set of cliques Ij,.(x,y0,k) satisfying 
Po(Ieoc(@, Y0, k) D Isoc(O, Y0, k)) > P*, VER (6) 
will be called upper set of level P* for the set Isoc(0, Yo, k) and denoted USC, P*, k). 
Definition 2.8. A set of cliques Il e(z, Yo, k) satisfying 
Po (Looe(2, Yo, k) C Isoc(0, y0, k)) > P*, VOEO (7) 
will be called low set of level P* for the set Isoc(0, yo, k) and denoted LSC(70, P*, k). 


Definition 2.9. Sets Iloc(£, Y0, k), I&el(£, Y0, k) satisfying 
Po (Loge (2, Yo; k) C Isoc(O, Y0, k) C Igoc(2,0,k)) > P*, VVER (8) 
will be called simultaneous low and upper sets of level P* for the set Isoc(0, Y0, k). 


Definition 2.10. A set of all PVE of all size k cliques will be denoted by J§,.(0, yo, k) = 
{(i, j): dl such that i,j € Ia, (0, yo), |Lct, (0, Yo)| = k}. 


3 Upper and low sets for edges 
Consider the set of individual hypotheses 
hij Vig > Yo versus ki: Yi,3 < Yo, t<J (9) 
Standard tests for testing hypotheses (9) have the form 


g a de “Eee 
Pi,3(@) = 0O WG 0) 


where c; ; defined from 
e €e 
Pyo (Tij < cij) = Qij 


Let Je(x, yo) = {(i, j) : Yij(@) = 0, i< j} be the set of indexes pairs such that 
the hypotheses (9) are accepted. Corresponding set in the sample space has the form 


ze: [| (@-¢%s@))=1 


(4,7) €Je(2,Y0) 


Lemma 3.1. Condition (2) is equivalent to the condition 


Ps [| G-¢is@)=1] > P*,veen (11) 


(4,9) € Je (9,0) 


Proof. It is sufficient to prove the equivalence of the events 


Je(w,70) D Je(9,y0) and [| Q-a) 
(i, j)EJe(0,Y0) 
If 
Je(x, Y0) D Je(9, Yo) 
then Y(i, j) € Je(0, y0) yi; (£) = 0 and therefore 


[[ G@-+#i,@)=1 
(7,5) €Je(9,Y0) 
If 
[[ G@-+#i,@)=1 
(4,9) € Je (9,0) 
then V(i,7) E€ Je(0, y0) Yi,j(x) = 0 and therefore (i, 7) E€ Je(x, yo) by definition of the 
set Je(x, yo). =E 


Note 3.1. In a sense lemma 3.1 was proved in [11] and given here for the sake of 
completeness. 


Note 3.2. Similarly one can prove the equivalence of the events 
Je(x, Y0) D Je(8, Y0) U J¢(9, Yo) 


and 
lI (1 - vij(z)) =1 
(i, j)EJe(0,y0)UJ ¢ (8,0) 
This means that the set Je(x, yo) with probability at least P* contain union of the sets 
Je(0, yo) and Jf(0, y0). 


Let set of tests yf ;(x), i,j = 1,...,N,i < j is a multiple hypotheses testing 
procedure for testing hypotheses (9) with FWER control in strong sense at significance 
level 1 — P* i.e. 


Po [] G-vis()) =0] <1-P*,ve en 


(4,9) € Je (9,0) 


As follows from lemma 3.1 and note 3.2 Je(x, yo) =USE(70, P*). Therefore USE(y0, P*) 
contain all pairs (i,j), i< j such that hypotheses hj; : yi,; => yo are accepted. 
For testing individual hypotheses 


hy eo gey 6 VERSUS Kh, Sg: > o tL] (12) 


standard tests have the form 


H = iL Tiza) > Ci j 
pije) = f 0, RANS by = 


where c¿; is defined from 

Pry (Tig > Cig) = Oi 
In the case USN(yo0, P*) contain all pairs (i,j) such that hypotheses hj; : yi,j < yo 
are accepted by multiple hypotheses testing procedure with FWER control in strong 


sense at significance level 1 — P*. 
Let us define the set 





Jie(x, Yo) = J \ USN(0, P*) (14) 


Corresponding set in the sample space has the form 


i ie! [I pi. aT) =p] = {x : Talt) > Ci j V(t, j) = Jie(x, Yo); a < j} 


(i, j)EJie(£,Y0) 
Lemma 3.2. J \ USN(y0, P*) = LSE(y0, P*). 
Proof. From note 3.2 one has 


Po (Jun(%, Y0) D (Jn(0, yo) U Ir (9,Y0))) = P*, VAER 


Since Jie(x, yo) = J \ Jun (£, Yo) then 
Po (J \ Jun (z, Y0) C J \ (In(9, Yo) U JECO, Y0))) = P*, VOER 


or 


Po(Jie(£, yo) C Je(0, Y0)) > P*, VOER 
LI 


Corollary 3.1. LSE(y0, P*) contain at least one PVN or PVU (pair (i, j) : i,3 < Yo) 
with probability < 1 — P*. 


Therefore LSE(yo, P*) could be constructed by any multiple hypotheses testing 
procedure for testing hypotheses (12) with FWER control in strong sense at sig- 
nificance level 1 — P*. LSE( 0, P*) contain all pairs (i,j) such that hypotheses 
hij : fij < Yo are rejected. 


4 Upper and low sets for cliques 
Let 


TUBE (a, yo, k) = {Tet,(USE(yo, P*)) : Her (USE(qo, P*)) 


| 
Eo 

~ 

| 
lon 
w 
=- 


be the set of all size k cliques into the set USE(yo, P*). 
Theorem 4.1. For any Ial0, yo) E€ Isoc(9, Yo, k) one has: 
P(Ale(USE( 0, P*)) € 122” (25058) : Ter USE(70, P*)) = Tei(8,0)) = P* 
Proof. Let Ial, yo) = {i1,..., ik}. Define 
Tot(O,¥0) = {(4, 9) : Ve,5(8) > Yo, Vi j € {i1;--- te}, i< J} 


Since Je(9, yo) C Je(0, yo) then from 


I] G-¥i;@)=1 


(4,9) €Je(9,Y0) 


follows 
I[ @-¢i@)=1 
(4,9) € Jet (9,70) 
Therefore 
P*<P( [| -rens [| a-re =n) 


(i, j)EJe(0, Y0) (i, j)EJe (9,70) 


Then pairs (i,j) € Jea (0, yo) are elements of the set USE(yo, P*). From definition of 
the set Je (0, yo) the theorem follows. O 


Corollary 4.1. 


UGP e (e, 10, k)) 


To proof the corollary it is sufficient to replace Jci(@, yo) on J$5-(@, yo, k) in the 
proof of theorem 4.1. 
Let 


IESE (æ, 90, k) = {Ien (LSE(0, P*)) : [Zor (LSE(q0, P*))| = k,i = 1,...,8} 


be the set of all size k cliques of the set LSE( yo, P*). 


Theorem 4.2. 1. If Isoc(0, Y0, k) #O then 


PULE (a, 0, k) cS Isoc(9, Yo, k)) 2 P 


2. If Isoc(9,¥0,k) =O then 
P(Alai(LSE(0, P*)) : Ha (LSB, P*))| = k) < 1- P 


Proof. 1. If ILSE (z, yo,k) = @ then the theorem is obvious. 


Let TLSE (a 0, k) Æ Ø. Consider the set J§,.(LSE( 70, P*), k)— set of all edges 
of all size k cliques from ILSE (z, yo, k). Since Jfoc(LSE(y0, P*), k) C J then by 
lemma 3.2 

Po(Jsoc(LSE(yo, P*), k) C Je(0,%0)) = P* 
Since J§,.(LSE(y0, P*), k) is the set of all edges of all size k cliques then cor- 
responding edges, belonging to Je(0, yo), are cliques edges also, therefore set of 
incident vertices are elements of the set Isoc(, Yo, k). 





2. Since Isoc(6,o0,k) = Ø then in any set of k vertices the exists two vertices 
Gina Vox Then 


P(Alei(LSE(yo, P*)) : Ia (LSE(%, P*))| = k) < %0) < 1 - P* 


by corollary 3.1. 


Corollary 4.2. 


Oor D= aR 


5 Cliques identification method 


The main points of proposed method to identify all cliques of a fixed size k in true 
threshold graph with a given probability P* are the following: 


1. construction USE(70, P*) and LSE(yo0, P*) and conclusions formulations: LSE(yo, P*) 
is the set of all reliable edges; USE(y0, P*)\LSE(40, P*) is the set of all unreli- 
able edges. 


2. construction USC(70, P*, k) and LSC(y0, P*, k) for given k > 2; 


3. comparative analysis of USC(y0, P*, k) , LSC(y0, P*, k) and conclusions formu- 
lations: 


(a) if set USC( 0, P*, k) = Ø then conclusion is Isoc(9, yo, k) = 9; 


(b) if LSC(4%, P*, k) Æ Ú then conclusion is: all identified cliques from the set 
LSC (40, P*, k) are reliable; 


(c) if USC(40, P*, k) \ LSC(40, P*, k) = Ø then conclusion is: Isoc(@, yo, k) = 
LSC(40, ca k); 

(d) if USC(q0, P*, k) \LSC(y0, P*, k) Æ Ø then conclusion is: all cliques from 
LSC(y0, P*, k) are reliable and cliques from USC (70, P*, k)\ LSC(y0, P*, k) 


are unreliable; 


4. cliques from USC(70, P*, k) \ LSC(y0, P*, k) could be ordered by numbers of 
pairs (i, j) E€ JSoc(USE(y0, P*)) \ JS-(LSE(y0, P*)) where Jf,.(USE(y0, P*)) is 

the set of all edges of all cliques from USC( 0, P*, k); 
It is obvious, the smaller size of USC(yo, P*, k) and greater size of LSC(yo, P*, k), 
leads to increasing proportion of reliable conclusions. Sizes of USC(yo0, P*,k) and 


LSC(70, P*, k) depends from multiple hypotheses testing procedure used to construct 
USE(y0, P*), LSE(yo, P*) and procedure used to construct USC(y0, P*, k), LSC(70, P*, k). 


6 Simultaneous upper and lower sets construc- 
tion for edges 


Let us consider the question: is it necessary to take into account multiplicity effect 
under simultaneous upper and low sets construction? According to definition 2.5 sets 
Jue(£, y0) C J, Jie(x, yo) C J are upper and low sets of level P* for Je(0, yo) if 


Po(Jue(x, Yo) e Je(0, Yo) e Jie(x, Yo)) a P, VER 


For simultaneous upper and low sets construction for edges one can use following 
simple relations: 


Pol dle, $ J(0, %0) T Jie( B76), = 


= 1 — Po(Jue(x,yo) D J(A, yo) } U {I (0, Yo) D Sie(x, Yo) }) 
If Jue(x, y0) =USE(90, Pr), Jie(x, y0) =LSE(40, Pž ) then 


Po(Jue(x, yo) D J(0, y0) D Jie(z,Yo)) > 1— (1 — Pf) — (1 — Ps) = (Pi + Pz) -1 


Therefore most simple way for simultaneous upper and low sets construction for 
Je(0, Yo) is to construct upper set of level Pf and low set of level Pš satisfying P* = 
Př + Px —1. In particular case PS = Pš = see? Most simple procedure has the 
form: 

Procedure 1: 


e Apply Bonferroni procedure with FWER = 1 — Py to test hypotheses hj; ; : 
%i,; > Yo and construct USE(0, Pr). 


e Apply Bonferroni procedure with FWER = 1 — Pz to test hypotheses hý; : 
Vi,j < yo and construct LSE(yo, P3). 


e Combine obtained results. 


In this case simultaneous construction LSE(yo0, Ps) and USE(j0, Py) using multiple 
hypotheses testing procedures lead to LSE( 0, P3) of smaller size and to USE(7, PÉ) 
of greater size due to decrease levels of significance 1 — Př and 1 — Py in comparison 
with 1 — P*. Such decrease can be avoided based on results from [20]. Let us present 
these results in the framework of the considered problem. 





Property 6.1. Under fixed i,j consider hypotheses hj; : Yi,j = Yo, hij + Vig < Yo 
versus ki j: Jij < Yo, kij: Vig > Yo correspondingly. Standard tests have the form: 


n 1, Lay > C2 
Pij = 0, ET = C2 


ee s. 1, Eri < Ci 
Pij — 0 Terz 
where cı is defined from Py, (Ti; < c1) = a1 and c2 is defined from Pys (Tij > c2) = 
ag. Then for ay + œz < 1 one has cı < c2. 
If pairs (i,j) such that 7:,; = yo could be interpreted as edges as well as pairs of 


vertices without edges then errors do not arise for the case yi,;j = yo and therefore 
simultaneous testing by Yi j, pij lead to errors in two cases only: 


1. if i,j < Yo but statistic Ti j > ca; 
2. al Vex > y bul statistice Tye <1 


Therefore probability of error in this case is equal to 


Poa T Ca Wear 
P. = Vi, j ’ J < 
9(error) P (ae ee max(aQ1, 2) 


Using such interpretation one can prove 
Po(USE(0, Pi) D J(6, 70) D LSE(40, P3)) > min(Py, Pz), VOER. (15) 
Let us restrict our attention to the case P% = Py = P*. 
Theorem 6.1. If Jue(x, yo) = USE(y0, P*), Jie(£, y0) = LSE(y0, P*) then 
Po( Jue(@,7¥0) D Je(0, y0) D elx, y0)) > P, VOCED (16) 
Proof. The opposite event to the event in brackets of (16) has the form 
A = {A(i,j) € J such that ((i, j) E€ Je(0, yo) but (i,j) £ Juelz, Yo)) 


or ((i,j) € Je(8, yo) but (i, j) € Jie(x, Y0))} 
e Let (i,j) € Je(0, yo). Then event A is equivalent to 
Ai = (i,j) € Je(0, yo) and (i,j) € Jue(£, Yo) 


since it does not matter (i, j) belongs to the set Jie(x, yo) or not. Since Jue(x, yo) = 
USE(y0, P*) then P(Ai) < 1 — P*. 


e Let (i, j) € Je(0, yo). Then event A is equivalent to 
Ag = (i,j) € Je(0, yo) and (i, j) € Jie(x, Yo) 


since it does not matter (i, 7) belongs to the set Jue(x, yo) or not. From corollary 
3.1 one has P(A2) < 1 — P*. 


E 


Note 6.1. Truth of (15) for PÉ # Pz follows from proof of theorem 6.1. 


Therefore procedure 1 could be improved by the choice P% = Pš = P*. 

In order to apply proposed method one can use any multiple hypotheses testing 
procedures for testing hypotheses hj j, hj; with FWER control in strong sense. Let 
us consider application of Holm procedure as most rejective multiple hypotheses test- 
ing procedure with FWER control in strong sense in the class of stepdown monotone 
procedures Pl For example one can apply Holm procedures for testing hypotheses 
hij t, j =1,...,N and h?;,7,7 =1,...,.N independently and combine obtained re- 


i,j? 
sults. 
Procedure 2: 
e Apply Holm procedure with FW ER = 1 — P* for testing all SAD hypotheses 
hij ij =1,...,N and construct USE(y0, P*). 


e Apply Holm procedure with FW ER = 1 — P* for testing all a hypotheses 





hlj i,j =1,..., N and construct LSE(y0, P*). 
Procedure 2 is preferable than procedure 1 since procedure 2 reject all hypotheses 
hi j, hij t, J = 1,..., N rejected by procedure 1 and may be more. This follows from 


Bonferroni and Toha procedures comparison and theorem 6.1. 
More interesting case is to apply Holm procedures consequently. 
Procedure 3: 


e Apply Holm procedure with FW ER = 1 — P* for testing al: ew hypotheses 
h; i,j =1,...,N and construct USE( 0, P*). 


e For pairs (i, i) such that hypotheses h; ; are accepted test hypotheses hý; using 
Holm procedure with FW ER = 1— P* and construct LSE(7yo0, P*) from all pairs 
(i, j) such that hypotheses h?’,,7,7 =1,...,N are rejected. 


i,j? 


ijo? 


Procedure 4: 


e Apply Holm procedure with FWER = 1 — P* for testing all ~ Ne) hypothe- 
ses hgj, i,j = 1,...,N and construct LSE(yo, P*) from pairs ep such that 


hypotheses h;’;,7,7 =1,...,.N are rejected. 


e For pairs (i, j) such that hypotheses h}; are accepted test hypotheses hj; using 
Holm procedure with FWER = 1 — P* and construct USE(j0, P*) from all 
pairs (i, j) such that hypotheses hj j,i, j =1,...,N are accepted and pairs from 
LSE(40, P T 


Procedures 3 and 4 are preferable than procedure 2 since procedures 3 and 4 
could lead to decrease of size USE(yo, P*) and increase of size LSE(yo, P*) due to 
reducing the number of tested hypotheses on the second steps. Procedure better than 
procedures 3 and 4 is proposed below. 





6.1 Procedure 5. Holm type procedure. 


Procedure 5 is formulated as the following algorithm: 


1. Initialization. Set J = {(i, j): i < j,i,j = 1,..., N}, |J|—number of the 
elements of J, Tij, (i,j) € J are test statistics for testing hypotheses (9) and 
(12). 


2. Sun(x, Yo) = J, Juelz, Yo) = J, Jin (2, Y0) = 9, Jie (£, Y0) = l. 


argMin(i jjeJ Tij. 


10 


A. define ci_p« from equation 














| J | 
1 — P* 
Pg p< eis = (17) 
[J] |J] 
5. define c*_ p+ from equation 
|J] 
1 — P* 
PATE Se) = (18) 
[J] |J] 





6. if (Uz < ĉ_px)&(Vy > c_p ) then stop, sets JS (x, yo) and JES (x, yo) are 
[J] [J] 
constructed. Otherwise: 


e if Uj > c_p: then 
| J| 





ee (eae) 


U 
Jan T = danti; yo) \ 
U 


Jin (£, Yo) = Jin (£, Y0) U { (imin, min) } 
e if V7 < ct_px then Ma 
tN i l Jael(x, Yo) = Jue (x, Yo) \ Alimin, jmin)} 


e — if (Uy > Ce \&(Vz < Cr case ) then J = INA marn marh (mda 


Ceres ’ Vines) 
{ 


(imaz ) Jms) 
( 

















[J] |J] 
=f (Uj < Crs \& (Vz < Clee) then J = J \ A Gerd mar ks 
[J] |J] 
= (Urre spe eN ee ont n Hal Gnas 9 pie) | 
see [J] 


— return to step 2. 


Theorem 6.2. e Ji (x, yo) =LSE(yo, P*). 
e Jue (x, yo) =USE(0, P*). 
o Size J (x, 70) is not least than sizes of LSE(yo, P*), constructed by procedures 


1 - 4. Size Jee(x,y0) is not greater than sizes of USE(yo, P*), constructed by 
procedures 1 - 4. 


Proof. e For step k set Jp = J \ {(41,91),---, (ik-1,jk-1)}. Event A=”for step k 
at least one pair (i, j) : yi,j < yo belong to Jis (x, yo)” is equivalent to Ai N A2 
where event A; mean that the procedure 5 does not stop until step k and event 


Az mean that Uj, > cf _p». Therefore if yi; < yo, one has 
| Jy, | 





P ((a,9) = Jie (X,Yo)) < Pa lUi pi cips ) Sl= p 


|Jkl 





Since on any step k one has JES (x, yo) U Jan (2, Yo) = J then 
Vig) ed gn oS PF) ei, a ae" 


i.e. Jin(x, yo) = USN(0, P*). Then it follows from lemma 3.2 that JES (x, yo) =LSE(0, 
e The proof is similar to the previous proof. 


e This is related to the ability of Holm type procedure to reject correctly wrong 
hypothesis which was accepted on previous step without P* violation. 


L 
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P*). 


Example 6.1. Consider the case N = 3. Let T\,2 < Ca, Ca < Tya < Ca Ca < T23 < 
Ca. In the case: 
e procedure 2 lead to the result: LSE(yo, P*) = 9, 
e procedure 3 lead to the result: LSE(yo, P*) = {(2,3)}, USE(y0, P*) = {(1,3 
e procedure 4 lead to the result: LSE(yo, P*) = 9, USE(yo, P*) = {(1,3), (2,3)}; 


e procedure 5 (step-by-step algorithm) lead to the result: LSE(yo, P*) = {(2,3)}, 
USH (6.8) = (L323 


7 Simultaneous upper and lower sets construc- 
tion for cliques 


By definition 2.9 sets Il e(x, yo, k), I% e(£, yo, k) are simultaneous upper and low sets 
of level P* for Isoc(0, Yo, k) if 


Po (Looe(2, Yo, k) C Isoc(0, yo, k) C Iie(x,0,k)) > P*, VOEQ (19) 


As follows from results of the section 4 most simple procedure for construction 
simultaneous upper and low sets for cliques of size k has the form: 
Procedure 6: 


e construct USE(, eh) LSE(0, Ps) using procedure 1. 
e construct TUSE (a, y0, k), ILSE (a, vo, k). 








Procedure 6 could be improved using ideas from [20]. 
Theorem 7.1. If I%,.(x,70,k) = USC(yo, P*, k), Iboc(x, Yo, k) = LSC(q0, P*, k) then 
Po (Loge (2, Yo; k) C Isoc(,¥0,k) C Igoc(2,¥0,k)) > P*, VER 


Proof. The proof of the theorem is similar to the proof of the theorem 6.1 and given 
here for the sake of completeness. 
Opposite event to the event in (19) has the form: 


A = {Ale1(6, yo) such that 


(Ial, yo) € Isoc(O, yo, k) but I(O,70) E [goc(x, Yo, k)) 
or (Ia(0, y0) ¢ Isoc(0, Y0, k) but Iei(,¥0) € Isoc(x, Yo, k)) } 
e Let Ial, yo) E€ Isoc(O, Yo, k). Then event A is equivalent to the event 


Ai = {Ie1(@, Yo) = Isoc(0, Yo, K) and Ial, Yo) ¢ Tipe Tk) 


Since [3o(2, Y0, k) = USC(40, P*, k) then P(A1) < 1 — P* 
e Let Ial, yo) É Isoc(9, Yo, k). Then event A is equivalent to the event 


A2 = {1a (0, y0) £ Isoc(O, Yo, k) and Ial, yo) € Il (£, 0, k)} 


Since Iloe(£, yo, k) = LSC(yo, P*, k) then P(A2) < 1 — P* 


12 





Therefore one can propose following procedure for construction simultaneous upper 
and low sets for cliques of size k > 2: 
Procedure 7: 


1. construct USE(70, P*), LSE(yo, P*) using Holm type procedure (procedure 5); 


2. construct TUSE (a, vo, k), ILSE (go, k) 


According to the results of section 4 


TUSE (a, 70, k) = USC(y0, P*, k), IESE (a, y0, k) = LSC(y0, P*, k) 


Therefore as follows from theorem 7.1 sets TUSE (x, %0, k), ILSE (g, Yo, k) are simulta- 
neous upper and low sets of level P* for Isoc(0, yo, k). As follows from the theorem 6.2 
procedure 7 lead to USC(yo, P*, k) of smaller size and LSC(y0, P*, k) of greater size 
than USC(y0, P*, k) and LSC(y0, P*, k) constructed by procedure 6. 





8 Application and discussion 


Let us illustrate some points of the proposed method of cliques detection by the fol- 
lowing example. Consider network model with 30 nodes corresponding to stocks from 
Dow-Jones index where matrix (1) is estimated by observations on stock returns for 
2021 years. Table containing the considered stocks is given in Appendix. Pearson cor- 
relation is used as measure y. According to terminology from [14] Gaussian Pearson 
network is used as model of these stocks returns. Standard tests (10), (13) in the case 
are based on statistics 


raed P 
Ta = iva 3 (m EPH -mE ) ig md. 





1 — Tij 1 — v5 
where 7 o Ee 
ae dts (ta (t) — i)(x;(t) — T3) 
Wie! Sn s a Onn y an So 
V Dini (Tilt) — Ti)? D (t) — T3)? 
ci j is a-quantile of distribution N (0, 1), c}; is 1 — a-quantile of distribution N (0, 1). 


The procedure 5 is used to construct USE(yo, P*), LSE(yo, P*). The procedure 7 
is used to construct USC (yo, P*, k), LSC (y0, P*, k). Results of the proposed method 
are compared with results of the traditional method. 

The traditional method to cliques detection has the form [8]: 


e edge (i,j) between nodes i,j is added to the threshold graph if and only if 
rig 2 Yo 
e cliques of size k detection. 


Let TSE( 0) be the set of edges of the threshold graph constructed by the traditional 
method. Note that LSE(70, P*) CTSE(70) CUSE(%, P*). 

Obtained results are illustrated on fig. 1 — 6. Reliable edges are drawn by solid 
lines and unreliable edges are drawn by dotted lines. 

LSE(y0, P*),TSE(yo) and USE(j0, P*) for yo = 0.4, P* = 0.9 are presented on 
the fig. 1 where the isolated nodes are dropped. In the fig. 1, left only edges 
from LSE(0.4,0.9) are presented. There are 21 reliable edges. TSE(0.4) contains 
21 reliable as well as 49 unreliable edges. Edges belonging to TSE(0.4)\LSE(0.4, 0.9) 
are drawn in the fig. 1, center and all of them are unreliable. Edges belonging to 
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USE(0.4)\TSE(0.4, 0.9) are drawn in the fig. 1, right and all of them are unreliable 
too. USE(0.4, 0.9) contain 288 edges. 

Consider cliques of maximum size. The cliques in the LSE(0.4, 0.9), TSE(0.4) and 
USE(0.4, 0.9) are presented on the fig. 2. The fig. 2, left represent maximum clique in 
LSE(0.4, 0.9). There are two cliques of size 5 namely (2, 4, 7, 10, 16) and (4, 7, 9, 10, 
16). There are 106 cliques of size 5 in TSE(0.4). The fig. 2, center represent maximum 
clique of size 8 in TSE(0.4). Only edges from J.(TSE(0.4),8) \ Je(LSE(0.4, 0.9), 5) 
are presented. There are two cliques of size 8 namely (2, 4, 7, 9, 10, 12, 16, 24) and 
(2, 3, 4, 7, 9, 10, 12, 16). One can see that clique (2, 4, 7, 9, 10, 12, 16, 24) contain 
12 unreliable edges but clique (2, 3, 4, 7, 9, 10, 12, 16) contain 14 unreliable edges. 
Therefore clique (2, 4, 7, 9, 10, 12, 16, 24) is more reliable than clique (2, 3, 4, 7, 9, 
10, 12, 16). The fig. 2, right represent maximum cliques in USE(0.4, 0.9). Only edges 
from Je(USE(0.4, 0.9), 12) \ Je(T S E(0.4),8) are presented. One can observe only one 
clique of size 12 namely (2, 3, 4, 7, 8, 9, 10, 12, 16, 18, 24, 26). With respect to 
proposed method one can conclude that in the true threshold graph for threshold 0.4 
there are no cliques of size greater than 12. 

Proposed approach lead to cliques detection of maximum size using step-by-step 
methodology. For the case when maximum clique in LSE(7o0, P*) is included in max- 
imum clique in US'E( yo, P*) the methodology has the form: 

















1. detect maximum cliques Cliower in DSE(yo, P*) and maximum clique Clupper 
in USE( 70, P*). Let desired clique Cl = Cliower. 


2. add node l € Clupper \ Cl with maximum number of incident reliable edges to 
the clique Cl. 


3. add nodes until proportion of unreliable edges is less or equal to the given 


threshold or until Clupper \ Cl Æ 9. 


The illustration of the step-by-step methodology is presented on fig.3. We start 
from a reliable clique of size 5, for example clique (2, 4, 7, 10, 16) which is presented 
on fiq. 2, left. To obtain most reliable clique of size 6 we add node 9 since there is 
only one unreliable edge (2,9) between node 9 and nodes of the clique (2, 4, 7, 10, 16). 
The obtained clique is presented on the fig. 3, left, where only edges incident to node 
9 are presented. This clique contain only 1 unreliable edge from 15 edges. ‘Therefore 
one can say that reliability of the clique is 1 — =: To obtain most reliable clique of 
size 7 we add node 24 since there are only four unreliable edges (4,24), (7,24), (9,24), 
(10,24) between node 24 and nodes of the clique (2,4,7,9,10,16). The obtained clique 
is presented on the fig. 3, center, where only edges incident to node 24 are presented. 
This clique contain 5 unreliable edges from 21 edges. Then one can say that reliability 
of the clique is 1 — =. To obtain most reliable clique of size 8 one can choose any node 
without clique (2,4,7,9,10,16,24) since there are no nodes with reliable edges. Such 
clique is presented on the fig. 3, right, where we add node 12. Only edges incident to 
node 12 are presented. This clique contain 12 unreliable edges from 28 edges. Then 
one can say that reliability of the clique is 1 — z, Note that reliability of any other 
clique of size 8 which contain clique (2,4,7,9,10,16,24) will be equal to 1 — z also. 
Note that the clique (2,4,7,9,10,16,24) is maximum clique of TSE(0.4). 

LSE(%0, P*), TSE(yo) and USE(yo, P*) for yo = 0.5, P* = 0.9 are presented on 
the fig. 4 where the isolated nodes are dropped. In the fig. 4, left only edges 
from LSE(0.5,0.9) are presented. There are 9 reliable edges. TSE(0.5) contains 9 
reliable as well as 27 unreliable edges. Edges belonging to TSE(0.5)\LSE(0.5, 0.9) 


are drawn in the fig. 4, center and all of them are unreliable. Edges belonging to 
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Figure 1: LSE(0.4,0.9), TSE(0.4) and USE(0.4, 0.9). 





Figure 2: Maximum cliques of LSE(0.4,0.9), TSE(0.4) and USE(0.4, 0.9). 





Figure 3: Cliques of size 6 (left), size 7 (center) and size 8 (right). yo = 0.4 
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Figure 5: Maximum cliques of LSE(0.5,0.9), TSE(0.5) and USE(0.5, 0.9). 


USE(0.5)\TSE(0.5, 0.9) are drawn in the fig. 4, right and all of them are unreliable 
too. USE(0.5, 0.9) contain 119 edges. 

Consider cliques of maximum size. The cliques in the LSE(0.5, 0.9), TSE(0.5) and 
USE(0.5, 0.9) are presented on the fig. 5. The fig. 5, left represent maximum clique 
in LSE(0.5,0.9). There is one cliques of size 4 namely (4, 9, 10, 16). There are 45 
cliques of size 4 in TSE(0.5). The fig. 5, center represent maximum clique of size 
7 in TSE(0.5). Only edges from Je(TSE(0.5), 7) \ Je(LSE(0.5, 0.9), 4) are presented. 
There is one clique of size 7 namely (2, 4, 9, 10, 12, 16, 24). One can see that the clique 
contain 7 reliable edges and 14 unreliable edges. (Edge (2,16) - solid line, other edges 
- dotted lines). The fig. 5, right represent maximum cliques in USE(0.5,0.9). Only 
edges from J-(USE(0.5, 0.9), 10) \ Je(TSE(0.5), 7) are presented. One can observe 
only one clique of size 10 namely (2, 3, 4, 7, 8, 9, 10, 12, 16, 24). With respect to 
proposed method one can conclude that in the true threshold graph for threshold 0.5 
there are no cliques of size greater than 10. 

The illustration of the step-by-step methodology for yo = 0.5 is presented on fig.6. 
We start from a reliable clique of size 4 ( 4, 9, 10, 16) which is presented on fiq. 5, left. 
To obtain most reliable clique of size 5 we add node 7 since there is one reliable edge 
(7,9) and there are three unreliable edges (7,10), (7,4), (7,16). The obtained clique 
is presented on the fig. 6, left, where only edges incident to node 7 are presented. 
This clique contain only 3 unreliable edge from 10 edges. Therefore one can say that 
reliability of the clique is 1 — =. To obtain most reliable clique of size 6 we add node 
2 since there is one reliable edge (2,16) and four unreliable edges (2,4), (2,7), (2,9), 
(2,10). The obtained clique is presented on the fig. 6, right, where only edges incident 
to node 2 are presented. This clique contain 7 unreliable edges from 15 edges. Then 
one can say that reliability of the clique is 1 — i. Further addition any other node 
from remaining nodes lead to clique of size 7 with 13 unreliable edges. ‘Then one can 


say that reliability of any clique of size 7 is equal to 1 — 3, 
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Figure 6: Cliques of size 5 (left), size 6 (right). yo = 0.5 


9 Conclusion 


The method of statistically significant detection of cliques is proposed in the article. 
The method is based on construction of two sets which are called upper set of edges 
and low set of edges. 

Important properties of these two set are discussed in the article. It is proved that 
set of cliques of fixed size k in low set of edges is the low set of level P* for set of true 
cliques of size k and set of cliques of fixed size k in upper set of edges is the upper set 
of level P* for set of true cliques of size k. 

Several algorithms to upper and low sets of cliques construction based on multiple 
hypotheses testing procedures are discussed. Holm type procedure for simultaneous 
upper and low sets for edges construction is proposed. It is proved that this procedure 
lead to simultaneously decrease upper set of cliques and increase low set of cliques 
without loss of significance. 

Proposed method allows to complement the traditional way of cliques detection. 
For example one can divide the set of conclusions obtained by traditional approach 
on reliable and unreliable. Moreover it is possible to rank cliques of the same size by 
number of reliable edges. From the other side using proposed method it is easy to 
detect reliable cliques only. 

Proposed method allows to introduce new concept of quasi-cliques also. For ex- 
ample one can define quasi-cliques as a clique such that proportion of unreliable edges 
is less than given number. 

Moreover the method allows to introduce step-by-step procedure for cliques de- 
tection. Note that the step-by-step procedure sometimes lead to cliques detection 
different from cliques obtained by traditional methodology. 
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11 Appendix 


List of tickers of Dow Jones index: 
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Index in graph Ticker Company Name 
0 AXP American Express Company 
1 AMGN Amgen Inc. 
2 AAPL Apple Inc. 
3 BA The Boeing Company 
4 CAT Caterpillar Inc. 
5 CSCO Cisco Systems, Inc. 
6 CVX Chevron Corporation 
7 GS The Goldman Sachs Group, Inc. 
8 HD The Home Depot, Inc. 
9 HON Honeywell International Inc. 
10 IBM International Business Machines Corporation 
11 INTC Intel Corporation 
12 JNJ Johnson & Johnson 
13 KO The Coca-Cola Company 
14 JPM JPMorgan Chase & Co. 
15 MCD McDonald’s Corporation 
16 MMM 3M Company 
17 MRK Merck & Co., Inc. 
18 MSFT Microsoft Corporation 
19 NKE NIKE, Inc. 
20 PG The Procter & Gamble Company 
21 TRV The Travelers Companies, Inc. 
22 UNH UnitedHealth Group Incorporated 
23 CRM Salesforce.com, inc. 
24 VZ Verizon Communications Inc. 
25 V Visa Inc. 
26 WBA Walgreens Boots Alliance, Inc. 
2G WMT Walmart Inc. 
28 DIS The Walt Disney Company 
29 DOW Dow Inc. 
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